[SOUND]
Hello
welcome to the course in
Text Retrieval and Search Engines.
I'm Cheng Xiang Zhai.
I have a nickname Cheng.
I'm a professor of the Department
of Computer Science at
the University of Illinois
at Urbana-Champaign.
this first lecture is a basic
introduction to the course.
A brief introduction to what
we we'll cover in the course.
We're going to first talk about the data
mining specialization since this course is
part of that specialization.
And then we'll cover motivation
objectives of the course.
This will be followed by pre-requisites
and course format and reference books.
And then finally we'll talk
about the course schedule,
which has number of topics to be
covered in the rest of this course.
So the data mining specialization
offered by the University of Illinois
at Urbana-Champaign is really to address
the need for data mining techniques to
handle a lot of big data,
to turn the big data into knowledge.
There are five lecture-based courses,
as you see on the slide.
Plus one capstone,
project course in the end.
I'm teaching two of them which is
this course, Text Retrieval and
Search Engines and this one.
So the two courses that I cover
here are all about the text data.
In contrast, the other courses are
covering more general techniques that can
be applied to all kinds of data.
So Patent Discovery taught by the
Professor Jowi Han and Cluster Analysis
again taught by him about the general data
mining techniques to handle structure.
The end and structure text data.
And data mine, data visualization
covered by professor Jung Hart is about
the general visualization techniques.
Again applicable to all kinds of data.
So the motivation for this course.
In fact also for
the other courses that I'm teaching
is that we have a lot of text data.
And the data is everywhere,
is growing rapidly, so
you must have been
experiencing this growth.
Just think about how much text data
you're dealing with every day.
I listed some data types here, for
example, on the internet we see a lot
of web pages, news articles etcetera.
And then we have block articles,
emails, scientific literature,
tweets, as well speaking,
maybe a lot of tweets are being written,
and a lot of emails are, are being sent.
So, the amount of text data is beyond
our capacity to understand them.
Also, the amount of data makes it possible
to actually analyze the data to discover
interesting knowledge and that's what
we meant by, harnessing big text data.
[MUSIC]

